Sparsity Score: a Novel Graph-Preserving Feature Selection Method

نویسندگان

  • Mingxia Liu
  • Daoqiang Zhang
چکیده

As thousands of features are available in many pattern recognition and machine learning applications, feature selection remains an important task to ̄nd the most compact representation of the original data. In the literature, although a number of feature selection methods have been developed, most of them focus on optimizing speci ̄c objective functions. In this paper, we ̄rst propose a general graph-preserving feature selection framework where graphs to be preserved vary in speci ̄c de ̄nitions, and show that a number of existing ̄lter-type feature selection algorithms can be uni ̄ed within this framework. Then, based on the proposed framework, a new ̄lter-type feature selection method called sparsity score (SS) is proposed. This method aims to preserve the structure of a pre-de ̄ned l1 graph that is proven robust to data noise. Here, the modi ̄ed sparse representation based on an l1-norm minimization problem is used to determine the graph adjacency structure and corresponding a±nity weight matrix simultaneously. Furthermore, a variant of SS called supervised SS (SuSS) is also proposed, where the l1 graph to be preserved is constructed by using only data points from the same class. Experimental results of clustering and classi ̄cation tasks on a series of benchmark data sets show that the proposed methods can achieve better performance than conventional ̄lter-type feature selection methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification

In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...

متن کامل

A Novel One Sided Feature Selection Method for Imbalanced Text Classification

The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...

متن کامل

Iterative sparsity score for feature selection and its extension for multimodal data

As a key dimensionality reduction technique in pattern recognition, feature selection has been widely used in information retrieval, text classification and genetic data analysis. In recent years, structural information contained in samples for guiding feature selection has become a new hot spot in machine learning field. Although tremendous feature selection methods have been developed, less i...

متن کامل

Epileptic seizure detection based on The Limited Penetrable visibility graph algorithm and graph properties

Introduction: Epileptic seizure detection is a key step for both researchers and epilepsy specialists for epilepsy assessment due to the non-stationariness and chaos in the electroencephalogram (EEG) signals. Current research is directed toward the development of an efficient method for epilepsy or seizure detection based the limited penetrable visibility graph (LPVG) algorith...

متن کامل

Rule Weight Optimization and Feature Selection in Fuzzy Systems with Sparsity Contraints

In this paper, we are dealing with a novel data-driven learning method (SparseFIS) for Takagi-Sugeno fuzzy systems, extended by including rule weights. Our learning method consists of three phases: the first phase conducts a clustering process in the input/output feature space with iterative vector quantization. Hereby, the number of clusters = rules is pre-defined and denotes a kind of upper b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJPRAI

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2014